Exercise 2

In this exercise, you will implement logistic regression and apply it to two different datasets. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.


In [ ]:
# Import numpy for linear algebra and numerical computing functions, and matplotlib for plotting graphs
import numpy as np
from numpy import ones, zeros, newaxis, r_, c_, mat, dot, e, size, log
from numpy.linalg import pinv
from scipy import optimize
import matplotlib.pyplot as plt

from IPython.html import widgets
from IPython.html.widgets import *
from IPython.display import display

# Enable matplotlib inline plotting for this notebook
%matplotlib inline

Logistic Regression

In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university. Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision. Your task is to build a classification model that estimates an applicant’s probability of admission based the scores from those two exams. This outline will guide you through the exercise.


In [ ]:
data = np.loadtxt('../data/ex2data1.txt', delimiter=',')
X = mat(data[:, :2]) # training example inputs
y = c_[data[:, 2]]   # training example outputs
m = X.shape[0]

data[:10] # First 10 rows of training examples (just for viewing)

Visualizing the data

Before starting to implement any learning algorithm, it is always good to visualize the data if possible. The following code will load the data and display it on a 2-dimensional plot by calling the function plotData. You will now complete the code in plotData so that it displays a figure like Figure 1, where the axes are the two exam scores, and the positive and negative examples are shown with different markers.


In [ ]:
def plot_data(X, y):
    """"Plots the data points x and y into a new figure
    plots the data points with + for positive examples
    and o for negative examples. X is assumed to be a
    Mx2 matric.
    """

    ### YOUR CODE HERE ###
    # Instructions: Plot the positive and negative examples on a
    #               2D plot, using the option 'k+' for the positive
    #               examples and 'ko' for the negative examples.
    #
    # Hint: To get help with plotting in IPython notebooks, start typing
    #       plt.plot( and the press Shift-tab
    #       Pressing Shift-tab multiple times will give more verbose help options
    
    pass

In [ ]:
# When you are done implementing plot_data(X, y) run this cell
# This will take the X array (Population of the city) and plot it against
# the output array y (Profit of a food truck in that city)
plot_data(X.A, y)
plt.ylabel('Exam 1 score')
plt.xlabel('Exam 2 score')
plt.legend(['Admitted', 'Not admitted'])

Implementation

Warmup exercise: sigmoid function

Before you start with the actual cost function, recall that the logistic regres- sion hypothesis is defined as:

$$h_\theta(x) = g\left(\theta^Tx\right)$$

where function $g$ is the sigmoid function. The sigmoid function is defined as:

$$g(z) = \frac{1}{1+e^{-z}}$$

Your first step is to implement this function so it can be called by the rest of your program. When you are finished, try testing a few values by calling sigmoid(x) in the following cell. For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. Evaluating sigmoid(0) should give you exactly 0.5. Your code should also work with vectors and matrices. For a matrix, your function should perform the sigmoid function on every element.


In [ ]:
def sigmoid(z):
    """Compute sigmoid function
    computes the sigmoid of z.
    """

   # You need to return the following variables correctly 
    g = zeros(size(z))

    ### YOUR CODE HERE ###
    # Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    #               vector or scalar).
    
    return g

In [ ]:
# try testing a few values by calling sigmoid(x)
print("Sigmoid of 0 should be 0.5, Actual = {}".format(sigmoid(0)))
print("Sigmoid of large postive values should be close to 1, Actual = {}".format(sigmoid(1000)))
print("Sigmoid of large negative values should be close to 0, Actual = {}".format(sigmoid(-1000)))

# z can be a matrix, vector or scalar
print("matrix: {}".format(sigmoid(np.mat([0, 1000, -1000]))))
print("vector: {}".format(sigmoid(np.array([0, 1000, -1000]))))
print("scalar: {}".format(sigmoid(0)))

Cost function and gradient

Now you will implement the cost function and gradient for logistic regression. Complete the code in the following cell to return the cost and gradient.

Recall that the cost function in logistic regression is

$$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\lbrack-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1-h_\theta(x^{(i)}))\rbrack$$

and the gradient of the cose is a vector of the same length as $\theta$ where the $j^{th}$ element (for $j = 0,1,...,n$) is defined as follows:

$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^{m}\left(h_\theta(x^{(i)})-y^{(i)}\right)x_j^{(i)}$$

Note that while this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of $h_\theta(x)$


In [ ]:
def cost_function(theta, X, y):
    """Compute cost and gradient for logistic regression
    computes the cost of using theta as the parameter for
    logistic regression and the graident of the cost
    with regards to the parameters.
    """
    
    # Initialize some useful values
    m = y.shape[0] # number of training examples
    
    # You need to return the following variables correctly
    J = 0
    grad = zeros(np.size(theta))
    
    ### YOUR CODE HERE ###
    # Instructions: Compute the cost of a particular choice of theta.
    #               You should set J to the cost.
    #               Compute the partial derivatives and set grad to the partial
    #               derivatives of the cost w.r.t. each parameter in theta
    #
    # Note: grad should have the same dimensions as theta
    #
    
    # I returned J.item(0) and grad.A[0] to get the elements, not the arrays
    return J, grad

Once you are done the following cell will call your cost_function using the initial parameters of $\theta$. You should see that the cost is about 0.693.


In [ ]:
X = mat(data[:, :2]) # Original X
X = c_[ones(m), X] # Add a row of ones
theta = zeros(X.shape[1])

cost, grad = cost_function(theta, X, y)
print(cost)
print(grad)

Learning parameters using fminunc

In the previous assignment, you found the optimal parameters of a linear regression model by implementing gradent descent. You wrote a cost function and calculated its gradient, then took a gradient descent step accordingly. This time, instead of taking gradient descent steps, you will use a Numpy function called fmin.

Scipy’s optimize module contains built-in optimization solvers that finds the minimum of an unconstrained function. For logistic regression, you want to optimize the cost function $J(\theta)$ with parameters $\theta$.

Concretely, you are going to use fmin to find the best parameters $\theta$ for the logistic regression cost function, given a fixed dataset (of X and y values). You will pass to fmin the following inputs:

  • The initial values of the parameters we are trying to optimize.
  • A function that, when given the training set and a particular $\theta$, computes the logistic regression cost and gradient with respect to $\theta$ for the dataset (X, y)

The call to fmin will look like this

#  Set options for fmin
options = {'full_output': True, 'maxiter', 400}

#  Run fminunc to obtain the optimal theta
#  This function will return theta, the final cost J, and the number of iterations performed
cost, theta, _, _, _ = optimize.fmin(lambda t: cost_function(t, X, y)[0], initial_theta, **options)

Here we first defined the options to be used with fmin. Additional output is needed to return the gradient out of fmin. In fmin we do this by setting full_output to True. This allows fmin to use the gradient when minimizing the function. Furthermore, we set the maxiter option to 400, so that fmin will run for at most 400 steps before it terminates.

To specify the actual function we are minimizing, we use a special syntax for specifying that an argument is a function with

lambda t: cost_function(t, X, y)[0], initial_theta

This creates a function, with argument t, which calls your cost_function. This allows us to wrap the cost function for use with fmin. The [0] is needed because our cost_function returns a tuple of (cost, gradient) and fmin is expecting only the cost.

The underscores in cost, theta, _, _, _ = ... are for specifying that those parts of the returned result can be ignored. The full output also returns the number of iterations the algorithm took, the number of function calls made, and warning flags. Therefore an equivalent way of writing this would be cost, theta, iters, funcalls, warnflag = optimize.fmin(...

If you have completed the cost_function correctly, fmin will converge on the right optimization parameters and return the final values of the cost and $\theta$. Notice that by using fmin, you did not have to write any loops yourself, or set a learning rate like you did for gradient descent. This is all done by fmin: you only needed to provide a function calculating the cost and the gradient.

Note: Constraints in optimization often refer to constraints on the parameters, for example, constraints that bound the possible values θ can take (e.g., θ ≤ 1). Logistic regression does not have such constraints since θ is allowed to take any real value.


In [ ]:
X = mat(data[:, :2]) # Original X
X = c_[ones(m), X] # Add a row of ones
initial_theta = zeros(X.shape[1])
options = {'full_output': True, 'maxiter': 400}

theta, cost, _, _, _ = optimize.fmin(lambda t: cost_function(t, X, y)[0], initial_theta, **options)

Once fmin completes, the next cell will call your cost_function function using the optimal parameters of $\theta$. You should see that the cost is about 0.203. This final $\theta$ value will then be used to plot the decision boundary on the training data, resulting in a figure similar to Figure 2. We also encourage you to look at the code in this cell to see how to plot such a boundary using the $\theta$ values.


In [ ]:
def plot_decision_boundary(theta, X, y):
    plot_data(X[:, 1:3], y)
 
    if X.shape[1] <= 3:
        plot_x = r_[X[:,2].min()-2,  X[:,2].max()+2]
        plot_y = (-1./theta[2]) * (theta[1]*plot_x + theta[0])
 
        plt.plot(plot_x, plot_y)
        plt.legend(['Admitted', 'Not admitted', 'Decision Boundary'])
        plt.axis([30, 100, 30, 100])
    else:
        pass

plot_decision_boundary(theta, X.A, y)

Evaluating logistic regression

After learning the parameters, you can use the model to predict whether a particular student will be admitted. For a student with an Exam 1 score of 45 and an Exam 2 score of 85, you should expect to see an admission probability of 0.776.

Another way to evaluate the quality of the parameters we have found is to see how well the learned model predicts on our training set. In this part, your task is to complete the code in the next cell. The predict function will produce “1” or “0” predictions given a dataset and a learned parameter vector $\theta$.


In [ ]:
def predict(theta, X):
    """Predict whether the label is 0 or 1 using learned logistic 
    regression parameters theta
    computes the predictions for X using a threshold at 0.5
    (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
    """

    # You need to return the following variables correctly
    p = zeros(X.shape[0])

    ### YOUR CODE HERE ###
    # Instructions: Complete the following code to make predictions using
    #               your learned logistic regression parameters. 
    #               You should set p to a vector of 0's and 1's
    
    return p

After you have completed the code, this cell will proceed to report the training accuracy of your classifier by computing the percentage of examples it got correct.


In [ ]:
prob = sigmoid(mat('1 45 85') * c_[theta])
print("For a student with scores 45 and 85, we predict an admission probability of {:.1%}".format(prob.item(0)))

predictions = predict(theta, X)
accuracy = (predictions == y).mean()
print('Training Accuracy: {:.1%}'.format(accuracy))